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Self-Paced Versus Paced EvaluaLion Utilizing 
Computerized Tailored Testing 
by 

Wayne Patience and Mark D, Reckase 
University of Missouri-Columbia 

Abstract 

The research investigated the implementation of computerised tailored testing 
for the measurement of achievement under paced versus self-paced examination 
conditions. One hundred and seventy-two undergraduate students -in on introductory 
measurement and evaluation course participated in the study. Students were 
rcmdomly assigned to nine experimental groups consisting of combinations of 
tv/o exams w\th the following testing schedules: paced tailored test, self-paced 

tailored test, and traditional paper and pencil test. Results on a comprehensive 

<j - 

final were used as dependent measures. The tailored testing procedure v/as based 
j on tlie simple logistic model, Attitudinal data was also incorporated in aj:talyses. 
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Self-Paced Versus Paced Evaluation Utilizing 
Computerized Tailored Testing 
by 

V7ayne M. Patience and Mark -D- Reckase 
University dl Missouri-Columbia 

Objectives of the Inquiry " • 

The tv/o primary objectives of the study described herein were 1) " to determine 
tic feasibility of implairentinc self-paced computerized tailored testing evaluation 
methods in an undergraduate measurement and evaluation course ^ and 2) to investigate 
possible differences in achievement levels under' a paced versus self-paced testing 
schedule. A maximum likelihood tailored testing prbcedu.Te based on the simple 
logistic model had previously been used for evaluation in this course, however, 
scheduling of the testing sessions had been determined by the instructor. 
The basic thrust of the initial question addressed the possibilitiea of having 
students determine vmen they v;ould prefer to take the exams. Availability of 
alternate forins is dramatically increased in as much as tailored testing will, 
usually not administer G.:actly the same test twice. The second question to be 
investigated was whether or not tliere would be significant differences in achieve- 
ment level of students allowed to schedule their exams and those whose exams were 
sch'Jduled by the instructor. 



Paper --resented at the Annual Meeting 'of the National Council on Measurement in 
Education, Toronto, 1978. This research was supported by Contract Number 
NOOOH-VT-C-OOgV from the Personnel and Training Research Programs of the Office 
of Naval Research and a University of Missouri Research Council grant. Mark 
D, Reckase was principal investigator for both grants. 
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Secondar:,' objectives included an investigation of two additional questions. 
Are there differences in achievement levels of students talcing paper-pencil tests 
and those talcing exams via computerized tailored testing? Do differences exist 
in student attitudes toward paper-pencil tests, paced tailored tests, and self- 
•pacGd" tailored tests? A four item Likert type attitude questionnaire was given 
to determine student attitudes toward the testing procedures. A comprehensive 
final which all students took at the same time under traditional paper and pencil 
conditions assessed the overall achievement level for each._,student. 

Instrumen tation '-„. 

All items administered on both the paper-pencil tests and computerized 
tests v.jre of the multiple-choice variety. The items adminiatered on the tailored 
tests were calibrated using tlie Rasch sivaple logistic model, and stored in an 
item pool to be accessed by the procedure. The methods employed for item 
selection and ability estimation by the computerized tests relate the probaibility 
of a correct response to the ability of the person and the easiness of the 
item. Item pools were constructed of items determined to be of sufficient 
quality and content across the continuum of easiness. The item calibration 
derived from the simple logistic model yields one parameter, easiness, for each 
itc-m. When an examinee is tested initially, the first item administered has a 
probability of .5 of a correct response for a person of average ability. If a 
correct response is obtained, tlie next item selected is more difficult. If the 
examinees response is incorrect, an easier item is administered. When both a 
correct and incorrect response has been obtained, the maximum-likelihood procedure 
estimates ability using an iterative search for the mode of the likelihood 
distribution. The tailored test continues the cycle of selecting and administering 
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items, recording the rcsporise pattern, and ma>:ing ability estimates until the 
item pool has been depleted of appropriate items for the examinee's estimated 
cibility, ability had been estimatP.d with sufficient accuracy, or twenty items have 
been administered. For a more complete description of the tailored testing 
procedure, see Lord, 1970; Weiss, 1974; Reckase , 1974; or Patience, 1977, 
This procedure has been demonstrated to have comparable reliability witli traditional 
paper and pencil tests wliich have many more items administered thus requiring 
much more time to admini:". ter (Reckase, Note 1). Also, test security is much less 
of a problem due to the previously cited readily available alternate forms- 

The computer used in administering the tailored- tests was an IBM 370A68 
with time sharing capability when linked with remote tenr.inals via phone lines. 
■ The terminal used for display of the test items and recording of examinees 
( " response patterns was a Beehive Mini-Bee II cathode ray teirainal. 

M ethods 

One hundred and seventy-'two undergraduate students in an introductory course 
in measurement and evaluation participated in Lhe study. Students were randomly 
assigned to the experimental groups which consisted of the nine possible com- 
binations of two exams with the follov/ing testing conditions: paced tailored 
test, self-paced tailored test, and traditional paper-and-^pencil test. This 
pairing of exams with the three testing modalities provided the basis for studying 
the feasilility of implementing student self pacing of their examinations. 
The students random.ly assigned to the nine experimental groups consisted of thos3 
students who volunteered for the study. Students that did not volunteer* for the 
experiment were also incorporated into the analyses as a "non experimental'* 
external control group. Results on a comprehensive final, which all students 
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took in the traditional manner, were used as dependent measures along with the 
students' total score in the class. 

Depending upon tlie experimental group in which tlie student v;as randomly 
assigned, he or she took the first two exams in the course in one of the following 
conditions: exam one self-paced and exam two self-paced (SPSP) , exam one 
self-x^aced and exam two paced (SPP) , exam one self-paced and exam two traditional 
(SPT) , exajTi one paced and exam two self-paced (PSP) , excim one paced and exam two 
paced{PP), exam one paced and exasix two traditional (PT) , exam one traditional 
and exam two self-paced (TSP) , exam one traditional and exam two paced (TP) , 
and exam one and tv;o both traditional (TT) . The TT grc,:ip and the non-experimental 
external control group (EC) were compared to determir.e whether differences 
existed between those v/ho volunteered and those who did not volunteer. Students 
were informed via a handout with their name on it hov/ they were to ta>:e the first 
two exams in the course. They were so acquainted with the procedure they were 
to follow depending upon how their exams were to be administered. If an exam was 
to be taken traditionally/ the date was specified and they took the fifty item 
multiple-choice test in a group. If an exam was scheduled as paced, they were 
told to come in during a period amenable for them but within a specified time 
frame of a few days. If an exam was to be taken self-paced, the student was 
informed that he could come in to the tailored testing laboratory and schedule a 
time at v/hich he or she would like to take that particular exam* Under the self- 
paced condition, students were permitted to take the exam as many times as they 
cared to unf'.l they were satisfied with the grade that they had achieved* 
Therefore, as was pointed out in the individualized instruction handout, a student 
could feasibly take a given exam even be::ore instruction in the course had 
completed that unit. If they scored well on the tailored test over this material^ 
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as would be the case if a student was well versed in the material from past 
training and experience, they would most likely forgo attending the class during 
this particulc.r set of instruction. 

The third exam for everyone was administered under traditional circumstances 
i.e. paper and pencil, and at the same time in a large group. This comprehensive 
exam of one hundred items v;as broken down into three parts. Part one consisted 
of fifty items over the last one-third of the course. Part two of the exam had 
twenty-five items covering the first one-third of the course or exam one material, 
and part three consisted of twenty-five items measuring achievement of the middle 
one-third or exam two material* The total score on the comprehensive final was 
also recorded* * 



The following data was collected on all of the experimental groups. On exams 
one and two, standard scores (Z) were recorded if the examinee took the traditional 
multiple-choice fifty item paper and pencil test. If the test was token on the 
computer terminal under paced or self-paced conditions, log ability scores were 
recorded. Standard scores were recorded for each of the three parts a<; v/ell as 
the total on exam three for all experimental groups. The log ability scores were 
converted to standard scores for the purpose of obtaining a total Z-score which 
consisted of two times the exam three total plus exam one score plus exam two score 
for each student in the course. The primary dependent variables utilized for 
evaluating possible achievement level differences i/icluded: 1) the total 
standard score for exam three taken as a v7hole (TOTAL) , 2) the standard score ''or 
part one of exam three which covered the last one-third of the course material 
(PART 1), 3) the standard score for pc:?rt two of exam three v/hich was a retention . 
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incasurc of the first exam material (PAFsT 2) , 4) the standard score for part three 
of exam three which was a retention measure for exam tv;o material (PART 3) , and 
5) the total score for the course (Total Z) , Tai^le 1 presents the coll means for 
each of these dependent measures for each of the testing conditions for exams one 
and tv/o as v;ell as the means for the external control group. 



Insert Table 1 about here 



Of special interest v;as i/hether or not significant achievement level differences 
existed between those students whose exams were scheduled by the instructor as 
contra:::ted with achieveiaent level of students allowed to Schedule their own exams. 
Also of concern, was whether differences existed among students' achievement when 
exams were administered traditionally with paper and pencil as opposed to exams 
administered via computerized tailored testing. With respect to this latter 

0 

investigatdon , careful attention was directed to scaling the log ability estimates 
obtained from computerized tailored testing to the standard scores resultant from 
traditional paper and pencil testing. In addition to comparisons of achievement 
level for self-paced and paced tailored tests versus traditional testing of those 
students Who volunteered and therefore were randomly assigned to experimental 
treatment conditions, was the comparison 'of achievement for students who did not 
volunteer as contrasted with those who did voluntarily participate* The external 
control group was, therefore, utilized in making a determination as to whether or 
not a selection effect occurred. The generalizability of results were thereby 
improved by the inclusion of the external control group into analyses. While 
research investigating the operating characteristics of computerized tailored 
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testing has been enhanced by utilization in actual classroom settings, students in 
previous studies were found to resent arbitrary assignment to experimental groups 
which were evaluated via computerized tests if they had not been given the opportunity 
to specify v;hether or not they were willing to participate in such a study. V/hen 
grades have been assigned by innovative and unfamiliar methods, students have 
exhibited concern and apprehension. This may suggest an advantageous factor 
related to motivation of students when addressing tlie use of computerized testing 
in studies where grades werp assigned on the bapis of these tests as opposed. to 
simulated studies or research iv which students participate and received extra 
credit for merely taking part. 

Analyses of variance were performed for each of the respective dependent 
variables previously delineated. The five analysis of variance tables are presented 
below in table two for the three by three factorial design with an external control 
group. The results presented have only three occurences of significant F values. 
These included: differences among the session one testing (SI) conditions for 
dependent variable Part 1, and differences among the s^* .ion two testing (S2) 
conditions for dependent variables Total and Part 3. ^ 



Insert Table 2 about here 



Due to the compounding of the alpha error by repeated analyses of ^-variance on the 
different dependent variables, at least one of the significant findings may be 
resultant of chance error instead of the existence of a true difference. A ventur- 
some postulate has been suggested by consideration of a contrasting trend. Across 
exam one conditions, the students tested traditionally tended to score a little 
better overall, whereas across exam two conditions, the paced computerized test 
group consistently tended to score 'her t-aken as a whole. Therefore, if one 
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was to hazard a discounting of one of the significant results, one could suggest 
that the difference across SI for dependent vari-ible Part 1 may not reflect a true 
difference. In terms of S2 conditions for dependent measures Total and Part 3, the 
results sug«/ested that the paced tailored test group scored better than the 
traditionally tested group. 

The findings, more importantly, supported the null hypothesis that overall 
differences between self-paced versus paced testing groups did not occur. There also 
did not appear to be significant overall achievement differences between individuals 
tested traditionally as opposed to those who were tested by the computerized 
test. None of the interactions of Si and S2 for the respective dependent variables 
were signx.. ican I: . Also, the external control group's performance was not significantly 
different from the other nine groups. 

Aptitude data was collected where available. This consisted of the college 
grade point average for each of the junior and senior level students in the course. 
Missouri Placement Test scores, Missouri College Entrance Test scores, SCAT 
verbal, quantitative and total scores, and high school ra^^k were also obtained 
when available. These aptitude measures . were found not to be highly predictive 
of any of the dependent variables when analyzed by multiple regression procedures. 
Also, a high proportion of missing data on these aptitude measures resulted 
from incomplete University records. 

Whenever an exam was administered via tailored test on the computer terminal, 
the number of items given was recorded. If the student took an exam under self- 
paced scheduling conditicns, the number of times the test was taken until he or she 
scored at a level that was satisfactory to the student was recorded. Students 
taking exams under traditional or paced scheduling conditions were allowed to take 
the exam only once* The mean number of items presented by the computerized tailored 
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tost was 12,6, representing a subiiaantial reduction in number of items administered 
as v/ell as time required to administer an individual test. Number of items did 
not have a 'significant correlation with the dependent varicibles sighted earlier. 
This suggests that having been administered fewer items on the computerized tests 
did not adversely effect students' performance on any of the corpponents of exam 
three or on total score for the class. With regard to the number of tinies students 
took the self-paced exams, the mean number of -3xams taken by self-paced students 
was less than tv;o^ suggesting that students under self-paced testing schedules did 
not take ^advantage of the provision of being able to take exams as many times as 
they desired in order to improve their scores. The maximiam number of times a test 
was taken uiider self-paced conditions was four. 

Attitudinal data addressing preference of testing modality, i.e. traditional 
or computerized tailored test, was collected using a four item Likert type 
attitude questionnaire. The following dimensions were measured: time pressure, 
perceived difficulty, anxiety, and overall preference. Table 3 presents . descriptive 
statistics in the form of frequency distributions. 



Insert Table 3 about here 



The totals for frequency of responses reflect some students v/ho did not respond 
to the attitude items. Overall trends appear to suggest that students found 
the tailored test to have less time pressure and ahout as difficult as traditional 
tests. They were about equally divided as to amount of anxiety associatied with 
the two testing modalities, and for the most part, overall preference was 
favorable to the computerized test. Attitude, measures correlate significantly 
with one another but not with achievement measures. This has been found to be the 
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case in -ther studies performed with fJiis questionnaire and similar student? v;hich 
vere ter.ted v/itr. tbo sair.e tailored testing procedure, 

D ip c u n s i o n a r. .1 Co uc] ' • ■ rcns 

The investigation of the feasibility of implementing self- ^aced scheduling of 
cor.puterized tailored testing found the procedure to be a viable one. There was a 
tendency for students taking an exam under self~x:)acGd scheduling conditions to 
procrastinate i as mucii as most students took their exam after the self-paced or 
traditional group had coirpleted tlie exam. Although self-paced students were allov;ed 
to take the exani as often as they liked, there was not a tendency for them to score 
higher on overall achievement across the different treatment conditions. There 
v/as no ^.;vidence t'nat suggested any najor discrepancy between achievement level for 
students ^ aking their exams paper and pencil as opposed to on the coiuputer terminal. 
Attitude data reflects that students did not find the tailored test to be objectionable 
t-n the dimensions measured, and ..- a large extent would prefer to take their exams 
on the computer terirti.. 1. - ' ■■ 

One possible suggested account of why senior level students in this particular 
study did not take full advantage of the self-paced condition was that the course 
itself was an eight week block class. This possibly did not provide enough time 
for :..tudents, who typically have not Deen acclimated to self-paced evaluation, to 

b- .e accustomed to the possibilities provided by the procedure. Further reser-^.rch 

into this area of the flexibility of computerized tailored testing is needed. 

The most important educational implication of this study suggested that 
compatcrized tailored testing offers alternative measurement procedures for 
evaluating pupil achievem.ent without substantial detrimental effects- Computerized 
tailored testing was found to be a viable method of self-paced evaluation which ir 
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important in much as cducai:ional programs are attempting to adapt to individual 
differences. This is especially true of computir^r assisted insuruction in v;hicri 
students progrosiv . c their ov;n rate^ and there is a need for frequent measurement 
of a^^hievcnont • Along this linOr the computerized test was found to necessitate 
significimtly fewer items and needed less time to administer to each examinee. 
Ready availability of forms of exams for tailored tests, as a result of its adaptive 
nature, (VTnitely and Dawis, 1974) alleviates burdensome paper v/ork in facilitating 
the evaluation of students' progress in a given course of instruction. 

In as much as computerized tailored testing has been demonstrated ncu to 
affect overall achievement in and of itself, the advantage of frequent and 
imm.ediate feedback to the learner can be gained by use of this type of exam.. 
In short, computerised testing is becoming more and more feasible and study 
demonstrates it to be a realistic alternativt. 
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Attitude Items Response Data 

!• Compared to multiple-choice tests, the tailored test has 

Response Value* 

Frequency Assigned 

(a) more tiire pressure, 8 1 

(b) less time pressure. 48 3 

(c) about equal time pressure. 18 2 

2. Compared to traditional multiple choice tests, the tailored test is 

Response Value 

Fre quency Assigned 

(a) easier, ^ 

(b) harder, 21 1 

(c) about as diff tcult. 47 2 

3. As compared to the traditional multiple-choice test, 

Response Value 

Frequency Assigned 

(a) I would rather take the tailored test. 42 3 

(b) I woulo rather take the traditional test, 22 1 

(c) I prefer both equally well. 10 2 

4. Taking the test on the computer makes me 

* Response Value 

Frequency Assigned 

(a) riore anxious than the traditional test. 27 1 

(b) less anxious than a traditional test. 19 3 

(c) about equally as anxious as the 

traditional test, -28 2 

*The5,e values were utilized in coding responses for correlating ttie items wit 
dependent measures. 
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